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Abstract — It is sliown how expectation maximization (EM) may 
be viewed as a message passing algorittim in factor graplis. In 
particular, a general EM message computation rule is identified. 
As a factor graph tool, EM may be used to break cycles in 
a factor graph, and tractable messages may in some cases be 
obtained where the sum-product messages are unwieldy. 

As an exemplary application, the paper considers linear Gaus- 
sian state space models. Unknown coefficients in such models give 
rise to multipliers in the corresponding factor graph. A main 
attraction of EM in such cases is that it results in purely Gaussian 
message passing algorithms. These Gaussian EM messages are 
tabulated for several (scalar, vector, matrix) multipliers that 
frequently appear in applications. 

Index Terms — Expectation maximization, factor graphs, mes- 
sage passing. 

I. Introduction 

Graphical models [1] in general and factor graphs [2]-[5] 
in particular provide a notation for structured system models 
that helps to describe and to develop algorithms for detection 
and estimation problems. A large variety of algorithms can be 
viewed as message passing algorithms that operate by passing 
locally computed "messages" along the edges of the factor 
graph. 

Expectation maximization (EM) [6]-[9] is an iterative tech- 
nique for parameter estimation which is widely used in statis- 
tics and signal processing. EM is a standard tool for parameter 
estimation in graphical models [10], [11], but EM has not 
traditionally been viewed as a message passing algorithm. 
Examples in communications include turbo synchronization 
[12]-[14], joint channel estimation and symbol detection [15]- 
[17], and distributed source coding [18]. 

An explicit formulation of a "factor graph EM algorithm" 
was proposed in [19] and [20], and a full description of EM 
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as a message passing algorithm with a general local message 
computation rule was presented in [21], which is the basis of 
the present paper. A similar approach was also pursued by 
O' Sullivan [22] and by Herzet et al. [14]. 

In a parallel development, Winn and Bishop made the 
important observation that variational inference can be put into 
message passing form [23], [24], and similar observations were 
made also in [25] and [26]. In fact, EM message passing may 
be viewed as a special case of variational message passing 
[27]. However, EM is not specifically addressed (and not even 
mentioned) in [23]-[25]. 

In this paper and its companion paper [28], we develop 
the EM algorithm as a general message passing technique 
for factor graphs. This formulation may be helpful in several 
different ways: 

> EM may be used to estimate unknown parameters in a 
factor graph model. 

> EM may be used to break cycles in a factor graph. 

• The EM messages are tractable expressions in some 
cases where the sum-product and max-product message 
computation rules yields intractable expressions. 

• Tabulated EM messages for frequently occuring 
nodes /factors allow the composition of nontrivial 
EM algorithms without additional computations or 
derivations. 

Conversely, the flexibility of the factor graph approach sug- 
gests many variations and extensions of the EM algorithm 
itself, as will be discussed in Section VI and in [28]. More- 
over, the EM message passing algorithm may be seamlessly 
combined with sum-product and max-product message passing 
in various ways. 

This paper begins with a brief review of standard EM in 
Section II and a detailed development of message passing 
EM in Section III. As quite some time has passed since the 
publication of [19]-[21], this part of the paper is perhaps 
mainly tutorial. 

In Section IV, we illustrate message passing EM by its 
application to linear Gaussian models (in particular, FIR filters 
and autoregressive filters) with unknown coefficients. In these 
examples, the EM messages turn out to be Gaussian, which 
yields a fully Gaussian algorithm for these nonlinear problems. 

These examples also illustrate the use of tabulated EM 
message computation rules. The derivation of the EM message 
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for a particular application is often not trivial and tables 
of precomputed EM messages can therefore be helpful. In 
Section V, we present tables of EM messages out of various 
"multipliers" that arise naturally in Unear Gaussian models 
with unknown coefficients. 

The proofs of these tabulated message computation rules 
are given in Appendices C-E. Appendices D and E rely on 
Gaussian sum-product messages tabulated in [5], which further 
illustrates the use of tabulated message computation rules. 

Some concluding remarks are offered in Section VI. 

The companion paper [28] begins with discrete variables 
and makes a tour through EM algorithms ranging from hidden 
Markov models to independent factor analysis. 

In this paper, we will use Forney-style factor graphs (also 
called normal factor graphs) as in [4] and [5], a variation 
due to Forney [29] of factor graphs as in [3]. The reader 
is specifically referred to [5] for details of the factor graph 
notation. In particular, we will use arrows (as in 7? and /T) 
for sum-product messages, and we will use capital letters for 
unknown variables (i.e., functions of the configuration space) 
and lower-case letters for particular values of a variable. 

From Section IV onward, multivariate Gaussian distribu- 
tions will be prominent. Such distributions will be parameter- 
ized either by a mean vector m and a covariance matrix V or 
by the inverse covariance matrix ("weight matrix") W = 
and the transformed mean vector Wm. For Gau^ian mes- 
sages, these parameters will be denoted by to, V, etc., as 
in [5]. We will sometimes allow messages to be degenerate 
(non-integrable) "Gaussians" e~2*^^ Wx-2x Wm) Yvhere the 
weight matrix W is positive semi-definite and singular rather 
than positive definite. 

II. Review of the EM Algorithm 

We begin by reviewing the EM algorithm in a setting which 
is suitable for the purpose of this paper. Suppose we wish to 
find 

^max = argmax/(6') (1) 
e 

for some function / : M" ^ M. We assume that f{0) is the 
"marginal" of some real-valued function f{x,6), i.e.. 



f{e)= / f{x,e)dx 



(2) 



where g{x) dx denotes integration of g{x) over the whole 
range of x. (The integral in (2) may be replaced by a 
sum if X is discrete, with obvious corresponding changes in 
subsequent expressions.) The function f{x,9) is assumed to 
be normegative: 



f{x, e)>0 for all X and all 9. 



(3) 



In addition, we assume < f{9) < oo for all 6. In 
other words, for any fixed 9, f{x,6)/f{9) is a probabil- 
ity density over x. We will also assume that the integral 

/(x, 9) log fix, 9') dx exists for all 9, 9'. 

The EM algorithm attempts to compute (1) as follows: 

1) Make some initial guess 



e 



/a 



/b 



Fig. 1. Factor graph of (7) with EM message e^^^^ . 

2) Expectation step: evaluate 

f{k)(e)^ j f{x,9^''^)logf{x,9)dx. 

(The base of the logarithm is immaterial.) 

3) Maximization step: compute 

^(fe+i) ^argmax/('=)(^). 



(4) 



(5) 



4) Repeat 2-3 until convergence or until the available time 
is over. 

The main property of the EM algorithm is 

> f^§ik)y (6) 

For the reader's convenience, a concise proof of (6) is given 
in Appendix A. In many apphcations, the expressions (4) and 
(5) turn out to be quite manageable and simpler than the direct 
maximization (1). 

In typical apphcations, f{x,9) is extended to f{x,y,9), 
where y is known and fixed. The function f{x, y, 9) is either a 
probability density over x and y with parameter 9 or it is a joint 
probability density over x, y, and 9. In the EM literature, y is 
called the observed data, x is called the missing (unobserved) 
data, and the pair [x, y) is called the complete data. 

III. EM AS A Message Passing Algorithm 

We now consider EM in factor graphs. We will do this in 
several steps. The resulting message passing algorithm will be 
summarized in Section 111-E. 

We henceforth assume that all logarithms are natural loga- 
rithms. 

A. Trivial Factor Graph 

We first consider a trivial factorization 



I{x,9) = !^{9)Mx,9). 



(7) 



the factor graph of which is shown in Fig. 1. (In typical 
apphcations, fA{9) is either a prior probabihty or constant.) In 
this setup, the EM algorithm amounts to iterative computation 
of a downward message ^'^'^^ and an upward message e''^^^ as 
follows. 



Upward message (EM message): e'^^^^ with 

JjB{x,9^''^)logfB{x,9)dx 



rj{9) 



/^/B(a;,^W)rfa; 
= Ej,^[\ogfB{X,9)], 



(8) 
(9) 
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where Ep^ denotes the expectation with respect to the proba- 
biUty distribution 



/b(x,^W) 



Downward message: 

Q{k+i) =argniax(/A(e)-e''(^)) 

= argniax(log/A(^) + ??(^)) . 



(10) 

(11) 
(12) 



The equivalence of this message passing algorithm with (4) 
and (5) may be seen as follows. From (4) and (5), we have 

= argmax //(a;,^('=))log/(x,6»)da; (13) 
e Jx 

= argmax / fA{e^^^)h{x,6^^^) 
e Jx 

■ \og{fA{e)fB{x,e))dx (14) 

= argmax / fB{x,e^'''>) 
e Jx 

■(\ogfK{e)+\ogfB{x,e))dx (15) 

= argmax log /a (6') 



+ 



/^/B(.T,gW)l0g/B(x,^)dx 

j^,h{x',ei'^))dx' 



(16) 



which is equivalent to (8) and (12). 
Some remarks: 

1) The quantity r]{9) may be viewed as a "log-domain" 
summary of /b. The corresponding "probabiUty do- 
main" summary e^^^^ is consistent with the semantics 
of factor graphs where messages are "summaries" of 
factors (cf. (11) and (22)). We wiU refer to e^(^) as the 
EM message. 

2) A constant may be added to r}{9) without affecting (12). 

3) If fxiO) is constant, the normalization in (8) can be 
omitted. More generally, the normalization in (8) can be 
omitted if /a(^) is constant for all 9 such that /a(^) 
(i.e., if /a(^) expresses a constraint); this case occurs 
in many appUcations. 

4) Nothing changes if we introduce a known observation 
(i.e., a constant argument) y into / such that (7) becomes 
f{x,y,9)=fA{y,e)fB{x,y,9). 

B. Nontrivial Factor Graph 

We now come to the heart of the matter: if is a vector, 9 = 
(^1, 6*2, ■ • ■), and if /b has a nontrivial factor graph, then the 
EM message e''^^) splits into messages e''^'^^^ e''^'^^), . . .that 
can be computed "locally" in the factor graph of /b- 

To see this, consider the following example (which actually 
covers the general case). Let 9 = (^i, ^2). let a; = {xi,X2,x^), 
and let 




Fig. 2. Factor graph of (17), a refinement of Fig. 1. 



the factor graph of which is shown in Fig. 2. In this case, (9) 
splits into 

??(^l,^2) = Epjlog (/c(Xi,X2,ei)/D(X2,X3,e2)) 



with 
and 



m(^l)+??2(^2) 
77l(0l) = Ep3[log/c(Xi,X2,0l)] 



??2(^2)=Ep3[l0g/D(X2,X3,e2)]. 

The EM message e''^^^ thus factors as 



g'7(«l,«2) = gm (Si) 6^/2 (02) 



(18) 
(19) 

(20) 

(21) 

(22) 

and the factors e''^^^^^ and e'^^^^^^ may be viewed as upward 
messages along the edge ©1 and 82, respectively, in the factor 
graph of Fig. 2. The downward messages in Fig. 2 are the 
estimates 

^^(fe+i)^^(fc+i)^ = argmax /A(^i,^2)e''^^^^^e''^(^^) (23) 

{61,02) 

as is obvious from (11) and (22). 

The expectation in (20) may be computed with respect to 
the probability distribution 

Pb{xuX2\9^''^)= f PB{xi,X2,X3\e^''^)dX3, (24) 

J X3 

which is the marginal of pB with respect to the arguments 
of /c, and the expectation in (21) may be computed with 
respect to the probability distribution 

PBiX2,X3\9^''^)= I pBixi,X2,X3\e^''^)dXu (25) 

J xi 

which is the marginal of pb with respect to the arguments 
of /d. 

Going through this derivation, we note that the generaliza- 
tion to an arbitrary factor graph for /b is immediate. Note, 
in particular, that the splitting of the expectation in (19) 
does not assume that the factor graph of /b is cycle-free. If 
g{x-i,. . . , Xm, 9g) is a generic node /factor in the factor graph 
of /b, we obtain r]g{9g) as in (I.l) and (1.2) in Table I with 



Plocal(2Jl , . . . , 



fB{x,9) = fc{xi,X2,9i)fT>{X2,X3,92), 



(17) 



-I 

J x:xi...Xn 



.fixed 



PBix\e)dx (26) 
fB{x,e)dx, (27) 



, fixed 
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TABLE I 

EM MESSAGE e''9 (^3 ) OUT OF A GENERIC NODE /FACTOR g. 



©9 








T — » -^"^ 


t7 



with 



[iogs(Xi,...,x^,eg)] 

Piocai{a;i,---,a;m|^) 

xi,...,Xtti 

logg(xi, . . . ,Xm,6g) dx-i ■ 



L 



(1.1) 



(1.2) 



(X g{xi_, ...,Xm,dg) Jtxi (a;i) ■ • ■ 7?x„ (a^m) (1.3) 

where 7?X£ denotes the incoming sum-product message along 

the variable / edge Xf computed for O = 9. 

A constant scale factor 7 in g results in a scale factor 7 in 
eVgi^g) which can be ignored. 



where "cx" denotes equality up to a scale factor. Note that 
the missing scale factor in (27) can be locally recovered by 
integrating (27) over xi . . . Xm- It remains to make the step 
from (27) to (1.3) in Table I. 

C. Using Sum-Product Message Passing for the Local Expec- 
tations 

If the factor graph of fsix, 9) is cycle-free (after removing 
the edges for = 9), then the marginals (27) can be 
computed by sum-product message passing (see [4], [5]) in 
this factor graph. As above, let .g(xi, . . . , x„n 9g) be a generic 
node /factor in the factor graph of /b. Then (27) may be 
computed as in (1.3) in Table I, where ~{ixi denotes the 
incoming sum-product message along the variable /edge 
computed for Q = 9. 

For example, we can write (24) as 

PBiXi,X2\9) (X / fc{xi,X2,9i)fB{x2,X3,92)dX3 (28) 



(29) 



fc{xi,X2,9l)'jIx2{X2) 



where /ixg is the right-to-left sum-product message along 
the edge X2 computed for Q = 9. (A constant message 
l^Xiixi) = 1 may be added as a factor in (29).) 

D. Using Max-Product Message Passing for the Maximization 

If /a can be factored into a cycle-free factor graph, then the 
maximization (23) (and its obvious generalization to general 
factor graphs) can be carried out by max-product message 
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|e''2(^2) 
X2 
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Fig. 3. Application of EM to general state space model. 



passing in the factor graph of /a. This applies, in particu- 
lar, to the standard case where /a(^i, 6*2, • • •) expresses the 
equaUty constraint Oi = Oi = . . ., which we will encounter 
in Section IV. 

E. Putting it Together 

Let us summarize the findings of this section by considering 

the factor graph of Fig. 3, which is an easy generalization 
of Fig. 2. Note that removing the edges 6i,...,6„ cuts 
the factor graph (Fig. 3) into two cycle-free components. Let 
9 = {9i, . . . ,9,,), X = {xi,...,Xn), and y = (j/i, . . . ,y„). 
Suppose that we wish to find 



9 = argmax/A(6') / fnix, y, 0) dx 

J X 



(30) 



for fixed known y. In this case, the EM algorithm applies as 
follows: 

1) Make some initial guess 9 = {9i, . . . , 9n). 

2) Perform forward-backward sum-product message pass- 
ing through the factor graph of /b (with 9e plugged into 
fe for e= l,...,n). 

3) Compute the EM messages e'^^^^i^ e''"'^") as in 
Table I. In this case, we obtain 

mm = Ep,^^[logfe{Xt_i,Xe, ye, 9e)] (31) 

where the expectation is with respect to the probability 
density 

Piocai{xe-i,xe\ye,9) oc fe{xi-i,xe,ye,9e) 

■l^Xe-i{xi-l)'TlXe{xe) (32) 

where Jixe-i and *flxe denote sum-product messages. 

4) Compute new estimates 

9 = {9u...,en) (33) 

= argmax ^(^1, ...,9^) e^'^^^^ ■ ■ ■ e^-^'"\ (34) 
{0i,-,en) 

If /a has a cycle-free factor graph, this maximization 
may be carried out by max-product message passing in 
that factor graph. 

5) Repeat 2-4 until convergence or until the available time 

is over 

All this applies to general factorizations of /a and /b 
provided that the resulting factor graphs (without the edges 
6i, ...,©„) are cycle-free. 
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If the factor graphs of /a and /b are not cycle-free, the same 
local computation rules can be used nonetheless and seem to 
work well in some applications, cf. [12J-|18J. 

In many cases, the computation of an EM message ac- 
cording to Table I requires substantial additional work. Pre- 
computed tables of such messages for frequently occuring 
nodes /factors can therefore be useful, as wiU be demonstrated 
in Sections IV and V. 

F. An Issue: Hard Constraints and Grouping 

Nodes in factor graphs often express "hard" constraints [4], 
[5]. For example, the constraint Xi = X2 (for real variables 
Xi and X2) may be expressed by the node/ factor 5{xi — X2), 
where 6 denotes the Dirac delta. It turns out that the EM 
message computation rule of Table I should not be apphed 
to such constraint nodes; the typical outcome of the attempt 
will be a degenerate EM message e''=(^=^ that expresses the 
constraint = 9s, which stalls the EM algorithm. 

For example, assume that Xi, X2, © are real variables and 
the node /factor 

g{xi,X2,0) = 5{xi- X26) (35) 
expresses the constraint Xi = X2Q. Then 

r]{e)(x / g{xi,X2,0)']txi{xi)l^X2{x2) 

■ ^ogg{xi,X2,9) dxidx2 (36) 
JiXi {X29)^X2 {X2) logg{x29, X2, 9) dx2 (37) 




I 

^ X2 



Fig. 4. Linear state space model with unknown coefficient vector = = 
G2 = . . . and white Gaussian input signal Ui,U2, - ■ ■ The figure shows one 
section of the factor graph. The multiplier node denotes the iimer product 
©I'Xj.. The label Af{m, a^) denotes a scalar Gaussian factor with mean m 
and variance cr^. The EM message computation rule is applied to the dashed 
boxes. 



We assume that the input signal Ui,U2, ■ ■ ■ is zero-mean white 
Gaussian noise with variance afj. We observe a noisy scalar 
output signal 

Yk = e^Xfc + Zk (42) 



7?Xi(a;2^) 7^X2 (2:2) logS{x2{9 - 6)) dx2, (38) where 9 is an unknown real column vector and where Zt is 



which is obviously pathological and illustrates the issue. 

It is usually easy to avoid this problem by grouping con- 
straint nodes with adjacent "soft" factors / nodes, as will be 
illustrated in Sections IV and V. 

IV. Examples: Identification of Linear Systems 

The following two examples arise in many appUcations. The 
use of EM to problems of this kind is not new, but neither 
is it trivial [33]-[35]. In communications, the example of 
Section IV-A may arise in channel estimation and the example 
of Section IV-B may arise in estimating the parameters of non- 
white Gaussian noise. 

A. FIR Filter Identification with Unknown Input Signal 

Let Xfe e M", fc = 0, 1. 2, . . . , N, be the time-fc state of a 
finite impulse response (FIR) filter with random input signal 
f/fe e K, = 1, 2, . . . , TV. Specifically, 



zero-mean white Gaussian noise with variance a^- From the 
observations Yk = yk, k = 1, 2, . . . , TV, we wish to estimate 
9. Specifically, we wish to compute the maximum-likelihood 
estimate 



9 = argmaxp(2/| 9) 



(43) 



with n X n matrix 



Xk = AXk-i + bUk 





In-1 



(39) 



(40) 



(where is the (n — 1) x (n— 1) identity matrix) and with 



6=(1,0,...,0)' 



(41) 



= argmax / / / p{u,x,y, z\9) dz dx du, (44) 

9 J u J X J z 

where y is defined as y = (yi, . . . , 2/jv) and where u, x, z are 
defined analogously. 

The factor graph of this system model, i.e., of 

p{u,x,y,z\9) 

N 

= Pi^o) Yi P{yk\xk, Zk,9)p{zk)p{xk\xk-i,Uk)p{uk), (45) 
fe=i 

is shown in Fig. 4. Note that the unknown coefficient vector 
Q appears in copies Qk, k = 1,2, . . . ,N (one copy for each 
time k) with an equality constraint 61 = . . . = 6jv- Note also 
that the factors p{xk\xk-i,Uk) and p{yk\xk, Zk,9) express 
the constraints (39) and (42), respectively; only the scalar 
Gaussian factors p{uk) and p{zk) are "soft" factors without 
Dirac deltas. The factor p{xq) (not shown in Fig. 4) is of 
secondary importance and may even be omitted in practice. 

Note that the edges 9^, fc = 1, 2, . . . , cut the factor graph 
into two cycle-free components. The equaUty constraints 9i = 



6 



TABLE II 

Gaussian MESSAGE PASSING BACKWARDS through a multiplier. 

X AND © ARE REAL COLUMN VECTORS AND S = ©^X IS A SCALAR. 
M{m, a^) DENOTES A SCALAR GAUSSIAN FACTOR WITH MEAN m AND 
VARIANCE (T^. The INCOMING SUM-PRODUCT MESSAGE 7?X IS 

Gaussian WITH parameters Wx and mx- 



X 
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N{ms,<J%) 



e'''*' is Gaussian with 

We = 
We me = 



Vx + rnxm"^ 
mxms 



with Vx and mx given by 

v-"^ = Wx + oe'^/al 

Wxmx = Wxmx +9ms/a%- 



(n.i) 
(n.2) 



(n.3) 
(n.4) 



62 = ... at the bottom of Fig. 4 correspond to /a in Figures 
2 and 3; everything else in Fig. 4 corresponds to /b in Figures 
2 and 3. 

With estimates Ok plugged in, the upper part (the /b part) 
of Fig. 4 becomes a standard Unear Gaussian factor graph, 
where sum-product message passing amounts to Kalman fil- 
tering/smoothing [5, Section V]. 

We now need to compute the EM messages e'"'^^'=\ Heeding 
the advice of Section 111-F, we group the multiplier node 
(which is a hard constraint) with the adjacent soft node/ factor 
p{zk) oc /{"^"z) as indicated by the dashed boxes in 
Fig. 4; this grouping (and integrating /marginalizing over the 
variables inside the box) results in the factor 



gk{xk,yk,Ok) 



Vk) 



a e 



-{elxk-ykf/{2al) 



(46) 
(47) 



which is perfectly well-behaved. Note that the missing scale 
factor in (47) can be safely ignored, cf. Table I. 

As it turns out, the EM message e'"'^^''^ out of the dashed 
box gk in Fig. 4 is Gaussian with weight matrix (inverse 
covariance matrix) Wq^ and mean vector tuq^. as given by 
(II.1)-(II.4) in Table II with mg = yk and = a%. The 
proof of (11.1)-(II.4) is given in Section V. 

It remains only to compute new estimates Ok by max- 
product message passing through the chain of equality con- 
straints at the bottom of Fig. 4. Since the incoming EM 
messages e'"''^'') are Gaussians, max-product message passing 
coincides with sum-product message passing with message 
computation rules as in Table 2 of [5]. 



In summary, both the expectation step and the maximization 
step of the EM algorithm can be carried out by Gaussian 
message passing. 

B. Autoregressive Filter Identification 

Consider the following state space representation of an 
autoregressive model. Let the state Xk G M", k = 1,2,. . . ,N 
evolve according to 



with 



Xk = AXk-i + bUk 
Of 



6= (1,0, 
and with nx n matrix 



A(e) 



in-i 



(48) 
(49) 

(50) 



where B is an unknown column vector of dimension n. We 
assume that the input signal Ui,U2, ■ ■ ■ , which is often called 
"innovation", is zero-mean white Gaussian noise with variance 
cr^. We observe a noisy scalar output signal 



Yk = {l,0,...,OfXk + Zk, 



(51) 



where Zi,Z2,--- is zero-mean white Gaussian noise with 
variance cr|. From the observation Yk = yk, k = 1,2, ... ,N, 
we wish to estimate ©; specifically, we wish to compute the 
maximum likelihood estimate 



6 = argmaxp(yl^) 



argmax 



p{u, X, y,z\0) dz dx du 



(52) 

(53) 



with y = iyi,y2, ■ ■ ■,yN) etc. 

The factor graph of p{u,x,y,z\0) is shown in Fig. 5. As 
in the previous example, the unknown parameter vector O 
appears in copies 9i = . . . = Oat, one copy for each time k. 

Again, for fixed Q = 0, this factor graph is linear Gaussian 
and cycle-free. 

The EM message computation rule of Table I may be 
apphed to the dashed box in Fig. 5. It turns out that the 
EM message e'"=(^'=^ is Gaussian with mean and weight 
matrix (inverse covariance matrix) Wq^ given by (III.7) and 
(III.8) in Table III. 

Again, we have obtained a purely Gaussian message passing 
algorithm. Apart from the EM message e^''^^''\ all messages 
can be computed as described in [5, Section V]. 

C. Remarks 

We conclude this section with some remarks on these 
examples. 

1) In order to make the described algorithms work in 
practice, it is necessary to pay attention to the scheduling 
of the message updates. A serial (left-to-right) sched- 
ule may actually work better than alternating forward- 
backward sweeps in the two components (corresponding 
to /a and /b) of the factor graph, cf. [30]. 
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Fig. 5. Linear state space model for autoregressive filter with b = c = 
(1, 0, . . . , 0)-'', with unknown coefficient vector ©, and with scalar white 
Gaussian innovation Ui,U2, ■ ■ ■ The figure shows one section of the factor 
graph. The multipUer node denotes the product j4(0)Xfc (50). The EM 
message computation rule (III.7) and (III.8) appfies to the dashed box. 



2) The point of these examples is only to illustrate the 
message passing view of the EM algorithm; we are not 
concerned here with analyzing and comparing different 
approaches to hnear-system identification [31]. 

3) Tabulated message computation rules (as in Table II) can 
greatly simplify the derivation of EM message passing 
algorithms. 

V. Gaussian Message Passing 

Through Multiplier Nodes 

A substantial part of traditional signal processing is essen- 
tially equivalent to Gaussian message passing in Unear models 
[5]. Unknown coefficients in such models introduce multiplier 
nodes into the corresponding factor graphs as is exemplified 
by Figures 4 and 5. 

The EM message out of such multiplier nodes, properly 
grouped with "soft" Gaussian nodes /factors as in Figures 4 
and 5, is invariably Gaussian (up to a scale factor), but the 
computation of its mean and its covariance matrix (in terms 
of the parameters of the incoming Gaussian messages) can 
be involved, cf. Appendices C-E. It is therefore helpful to 
tabulate such messages as exemplified by Table II. 

However, such multiplier nodes come in surprisingly many 
versions: scalar times scalar, scalar times vector, inner product 
of two vectors (as in Fig. 4), general matrix times vector, 
products involving matrices with a special structure (as in 
Fig. 5), etc. Moreover, the grouping of such multipUer nodes 
with suitable soft factors /nodes is another source of virtually 
endless variety. 

We will therefore confine ourselves to a small number 
of cases which appear to be particulary useful and widely 



appUcable. The general setup is shown in Table III and the 
results are given in Tables III and IV. In all cases, we have a 
multiplier U = A{B)X, where ^(9) is a matrix that depends 
on ©, grouped with Y = U + Z, where Z is zero-mean 
Gaussian with covariance matrix Vz = W^^ (or ctI in the 
scalar case). In all cases, we assume that Gaussian messages 
7?x and '/xy arrive via the edges X and Y, respectively; these 
incoming messages are parameterized by the mean vectors 
mx and my and the covariance matrices Vx = and 
Vy = Wy^, respectively. The following cases are considered: 

1) Inner product: ^(8) = 8^, both 8 and X are real 
column vectors (of the same dimension), and both U = 
Q^X and Y are real scalars. 

This case is a generalization of Table II, as will be 
discussed at the end of this section. 

2) Real scalar 6 times real column vector X: A{Q) = O 
and both U = QX and Y are column vectors. 

Some pertinent properties of the trace operator ("tr") are 
recalled in Appendix B. 

3) Componentwise product (denoted by 8 X) of real 
column vectors 6 and X: A{Q) = diag(8), a diagonal 
matrix with the elements of 6 on the diagonal, and both 
U = Q Q X and Y are column vectors. 

4) Autoregression: Q,X,Y are column vectors in R" and 
A{Q) is the square matrix (50) (which is essentially 
a companion matrix). In addition, Z is a zero-mean 
Gaussian vector with covariance matrix 



/-I 



Vz = 







V 



\ 



(54) 



i.e., Z is effectively a scalar that affects only the first 
component li of y. 
5) General real matrix G times real column vector X: 
A{e) = 8 and both U = QX and Y are column 
vectors. 

The symbol "(g)" in (III.9) and (III. 10) denotes the 
Kronecker product, cf. (124)-(125). More about this 
case is said below. 

The case of scalar G times scalar X is a common special 
case of all these cases and does not need to be considered 
separately. 

In the cases 1^, where G is a column vector (or a scalar), 
the EM message e^^^^ is Gaussian with mean vector me and 
weight matrix (inverse covariance matrix) Wq as given in 
Table III. 

In Case 5, where G is a matrix, we need the following 
notation. Let B be any m x n matrix and let 



bi 



B = 



(55) 



be the decomposition of B into its rows. We will use both the 
row stack vector 



rvect(B) = . . . ,6^) 



(56) 
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TABLE III 

Gaussian backward EM messages e''(^) through some multiplier 
NODES, SEE Section V. The EM message e''^^' is always Gaussian 
(UP to a constant scale factor) wn II parameters We and me as 
stated. See also Table IV. 



TABLE IV 

Computation of means mx and ray and covariance matrices Vx 




Inner product of column vectors G and X, 

A{e) = e^: 



We 



: O: 



(in.i) 
(in.2) 



Scalar times column vector X, A(S) 

1/^1 = tr {WzVx) + m\Wzm.x (ni.3) 

me/'o^e = {WzVxyt) + ra\WzmY (in.4) 

Componentwise product X of column vectors and X, 
A(&) = diag(0): 

We = Wz& (Vx + mxm'^) (HI.S) 
WemQ = (Wz (VxYT +mxm^)) 

• (1, 1, ... , 1)^ (in.6) 

Autoregression, see (50) and (54): 

We = (Vx + mxmj,'^ (in.7) 

Weme = o-^^ (VxYi + mxruYi) (ni.8) 

General matrix times column vector X, A(0) = 0: 
evW is Gaussian in rvect(5)^ with 

We = Wz^ {Vx +mxm^) (in.9) 

Weme = (Wz In) cvect{VxYT + mxruY) (IILIO) 



and the analogous column stack vector cvect(i?), where 
the columns of B are stacked into one column vector. For 
example, if 

B = ( Ji'i I''' \ (57) 

V ^2,1 ^2,2 J 

then rvect(B) = bi,2, &2,i, ^2,2) and cvect(B) = 

^2.1, ^1,2; ^2,2)"^- With this notation, the EM message is 
Gaussian in rvect(6)^ with parameters (III.9) and (III. 10) 
(see also (132)). 
Note that Table III gives the analog of (II.l) and (II.2) in 



Auxiliary quantities: 

Wx=Wx+ A{e)'^ (Vz + (IVl) 

(IV.2) 



Vy = A{e)VxA{e)'^ + Vz 

Wy = (Vy + VyY^ 
Quantities in Table 111: 

Vx = 

= Vx- VxA{e)'^WYA{§)Vx 
VxYT = VxA0)'^WyVy 

mx = Vx (wxmx + Ai§f (Vz + Vy 



(1V.3) 

(1V.4) 
av.5) 

av.6) 



my 
(IV.7) 



= (in - VxA{efwYA{e)) 

■ (tUx + VxA{ef (Vz + Vy)'^ 

(IV.8) 

niY = Vy (Wytuy + Wy W) (IV.9) 
= (Jm - Vy Wy) (my + VyWYmY^ (TV.IO) 



Table II; the analog of (II. 3) and (II.4) is Table IV, which 
gives expressions for the marginal means nix and my and 
for the covariance matrices Vx and Vxyi" for fixed9 = ^ in 
terms of the parameters rrtx and Vx and my and Vy of the 
incoming Gaussian sum-product messages. Note that Table IV 
applies to all the cases in Table HI simultaneously. 

The proofs of the claims in Table III are given in Ap- 
pendix C and the proofs of the claims in Table IV are 
given in appendices D and E. Not surprisingly, some of these 
derivations are essentially equivalent to similar computations 
in the EM literature [33]-[35]. Nevertheless, most of the 
statements in Tables HI and IV do not seem to be readily 
available in the prior literature. 

We conclude this section by considering the specialization 
of Case 1 (iimer product) toY = y fixed, which results in the 
situation of Table II. In this case, we have 



and 



my = my = y 



Vxy = Vy = Vy=0. 



(58) 



(59) 



With the translations ms 



m.Y and (t| = cr|, it is 



obvious that (III.l) and (III.2) specialize to (11.1) and (II.2), 
respectively. Moreover, with ^(^)'^ = 6, it is obvious that 
(II.3) follows from (IV.l) and (II.4) follows from (IV.7). 

VI. Conclusions 

We have showed that EM may be viewed and used as 
a message passing algorithm in factor graphs, and we have 
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identified a general "local" EM message computation rule (Ta- 
ble I). In some important cases, the EM messages are tractable 
expressions, which was exemplified by the EM message out 
of multipliers (arising from unknown coefficients) in linear 
Gaussian models. 

As a fuU member of the family of message passing algo- 
rithms, it is easy to seamlessly combine expectation maxi- 
mization with other message passing algorithms in interesting 
ways. In particular: 

• EM messages (like all messages) may be represented 
in many different ways (including Gaussians as in Sec- 
tions IV and V, Gaussian mixtures [28], particles [32], 
etc., leading to quite different actual computations. 

• The freedom (or the necessity) to choose some definite 
message update schedule leads to different algorithms 
with different performance; more about this will be said 
in [28]. 

• The maximization step amounts to applying the max- 
product algorithm to the corresponding subgraph, which 
in turn may be carried out by many (exact or approxiate) 
message passing algorithms. For example, in some impor- 
tant applications (as, e.g., in Section IV), the maximiza- 
tion step can be done by Kalman filtering /smoothing. 

• The expectation step relies on plain sum-product mes- 
sages. However, depending on the involved nodes and 
message types, the sum-product algorithm may be real- 
ized (exactly or approximately) in many different ways, 
cf. [5, Section VI]. 

Moreover, it is a general observation that tabulated mes- 
sage computation rules can greatly simplify the derivation of 
message passing algorithms [5]. This applies, in particular, to 
EM messages, which we have tabulated for various multiplier 
nodes (scalar, vector, general matrix, . . . ) with incoming 
Gaussian messages. With these message tables, EM algorithms 
for a number of basic linear-system identification problems 
can easily be composed without additional derivations or 
computations. More such tables will be given in [28]. 

Appendix A 
Proof of Equation (6) 

We give a variation of a standard proof (cf. [9]) that is 
adapted to the setup of Section II. The heart of the proof is 
the following fact. 

Lemma: The function 

m 0) = m + [ fix, 0) logf ) dx (60) 

(where "log" denotes the natural logarithm) satisfies both 

f{0,O)<f{O) (61) 

and 

f{0,e) = f{e). (62) 



□ 



Proof: The equality (62) is obvious. The inequality (61) 
follows from eUminating the logarithm in (60) by the inequal- 
ity log(a;) < a; — 1 for X > 0: 

f{e, 6) < f{6) + [ fix, 9) l^-p^^ - 1 ) dx (63) 



fi9)+ / f{x,6)dx- / f{x,e)dx (64) 

J X J X 

f{0). (65) 



To prove (6), we first note that (5) is equivalent to 



^(fc+i) ^argmaxM^^'^)). 
e 



We then obtain 



< f{e^^+''\e^^^) 

< f{e^^+^\ 



(66) 



(67) 
(68) 
(69) 



where (67) follows from (62), (68) follows from (66), and (69) 
follows from (61). 

Appendix B 
Some Properties of the Trace Operator 

We recall some pertinent properties of the trace operator 
for use in Appendix C-B. The entries of a matrix A will be 
denoted by a^^^. The trace of a square matrix A is the sum of 
the diagonal elements of A: 



(70) 



For matrices A and B such that AB is a square matrix (i.e., 
B has the same dimensions as A^), we have 



fc I 
iv{BA). 



(71) 

(72) 



In particular, if x and y are column vectors (with the same 
number of rows), we have 



x y = y x = ty:(xy ). 

Moreover, for W = A^A, we have 

x^Wy = {AxfAy 

= tv{Ax{Ayf) 
= tv{Axy'^A'^), 

and using (72) we further obtain 

x'^Wy = tT{Wxy'^) 
= tT{xy'^W). 



(73) 

(74) 
(75) 
(76) 



(77) 
(78) 



Now let X and Y be random column vectors with the same 
dimensions. Let nix = ^[X] and my = E[^] and 



VxY-r =E[{X -mx){Y - myf 



(79) 
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Then, for any square matrix W as above (i.e., W = A^A) 
with suitable dimensions, we have 

E[X^WY] 

= E[{X- mxfW{Y - my)] + m^Wmy (80) 

= E[tr(VF(X - mx){y - myf)] + m^Wmy (81) 

= tr {WVxyT) + m^Wmy. (82) 



Appendix C 
Proofs of the Claims in Table III 

Recall (for repeated use below) that the probability density 
function of an n-dimensional real Gaussian random vector 



becomes 



^^(x-m)^W{x-m) 



oc e 



(27r)" 

^{x'^Wx-2x'^Wm) 



(83) 

(84) 



where m is the mean vector and W = (a positive definite 
matrix) is the inverse of the covariance matrix V. In the scalar 
case (n = 1), we will also use the notation cr^ = V. 

Now consider the factor graph in Table III. The closed-box 
function g{x,y,9) is obtained by marginalization/ integration 
over the variables inside the dashed box: 

9{x,y,e) 



= I S{u-A{0)x)J^^:^e-i^y--^^^^^y--Uu (85) 



/ 

J u 



(27r)" 



_ dctjWz) i(y_A(9)x)'^Wz(v-A(e)x) 

V (2^)" 

The exponent (I.l) of the EM message e''*^^-' is 



V{e)=E[\ogg{X,Y,6)] 
1 fdet{Wz) 



(86) 



(87) 



= o log 



V (27r)" 



- -E[(Y - A{9)XfWziY - A{e)X)] (88) 
= const- ^(E[{A{0)XfWz{A{0)X)] 

-2E[{A{e)XfWzY]y (89) 

where all logarithms are natural, where the expectation is over 
X and Y (with respect to the local probability (137)), and 
where "const" subsumes all terms that do not depend on 0. 

We are now ready to discuss the individual cases of Ta- 
ble in. 



A. Inner Product of Column Vectors 9 and X 

In this case, we have A{0) = 0'^. The quantities 0'^X, Y, 
and Wz are scalars; in particular, {0'^X)'^ = 0'^X. Thus (89) 



rj{0) = -^(E[{0^XfWz{e'^X)] - 2E[{e^ XfWzY\ 
+ const 

= - ^(E[e^XWzX^0] - 2E[0'^ XWzY]^ 
+ const 

= -^(0'^E[XWzX^] - 20'^E[XWzY]j 



(90) 
(91) 



-I- const. 



(92) 



It is then obvious from (84) that the EM message e^^^^ is 
Gaussian (up to a scale factor) with weight matrix 



Ws = E[XX^] a 
_Vx + mxm^x 



-2 

z 

T 



and 



WemQ=E[XY] a'^ 

Vxy + mxmy 



(93) 
(94) 

(95) 
(96) 



B. Scalar Q Times Column Vector X 

In this case, we have A{6) = 9, a scalar, and (89) becomes 

r){9) = const - i (9^E [X'^WzX] - 29E [X'^WzY] ) . (97) 

It follows from (84) that e^^^^ is Gaussian with 

"Fq^ = E[X^WzX] 

= tr {WzVx) + m^Wzmx 

and 

tfis/'^& = E[X^WzY] 

= tr (WzVxyT) + mxWzmy 



(98) 
(99) 

(100) 
(101) 



where (99) and (101) follow from (82) and with Vxyr defined 
as in (155). 

C. Componentwise Product Q @ X of Column Vectors 

In this case, we have A{0) = diag(^), a diagonal matrix 
with the elements of on the diagonal, and (89) becomes 

r]{0) = const - ^ (E[(diag(^)X)^Wz(diag(^)X)] 

- 2E[{di&g{0)XfWzY]) (102) 
= const - ^(E[{diag{X)9fWzidi&giX)9)] 

- 2E[{diag{X)0 fWzY]'^ (103) 
= const - ^ (6l^E[diag(X)Wz diag(X)] 

- 26l'^E[diag(X) WzF]). (104) 
It follows from (84) that e^^^^ is Gaussian with 



We = E[diag(X)Wz diag(X)] 
= WzQE[XX'^] 
= WzQ {Vx + mxmjr) 



(105) 
(106) 
(107) 
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and 

We me 



= E 
= E 



di&g{X)WzY] 



diag(X)W^zdiag(F)(l,l,...,l) 



(108) 
(109) 

= {WzQE[XY^]){l,l,...,lf (110) 
= {Wz {VxYT + mxm^)) (1, 1, ... , if. (Ill) 



D. Autoregression ( Companion Matrix) 
In this case, recall from (50) that 

0^ 



Aie) 







(112) 



where n is the dimension of the column vector 9, and where 
/„_i is the (n — 1) x (n — 1) identity matrix. 

Before we proceed, we need to address the following issue. 
According to (54), we have 



/ a% 



Vz 



z 

e 
e 



V 





/ 



(113) 



with e = 0, which creates a problem with Wz = . We 
address this problem by proceeding with (113) with e > 0. As 
it turns out, the resulting expression for r}{6) does not depend 
on £ (except in an additive constant, which we ignore). 
Using (112), (89) becomes 



'q{6) = const - 



E 



/ 0^X 




1 


( e^x 


\ 


Xi 




Xi 






Wz 






\ Xn~l 






\ Xn-l 


J 



2E 



/ 0^X \ 

Xi 



WzY 



(114) 



V Xn-i / 

Using (113) and ignoring all constant terms yields 
ri{6) = const 

- ^(E[0'^Xaz^9'^X] - 2E[0^Xaz^Yi] ) (115) 
= const 

- ^(^e^a^^E[XX^] 6 - 2e''a^^E[XY^]). (116) 
It follows from (84) that e''^''' is Gaussian with 



We = <Jz^E[XX'^] 



= Or, 



{Vx + mxmFx) 



and 



We'^ = o-z^E\XY^\ 

= Uz'^ (VxYi + mxmYi) ■ 



(117) 
(118) 



(119) 
(120) 



E. General Matrix 9 Times Column Vector X 

We need to begin with some preparations. Recall the row 
stack operator rvect (56) and the corresponding column stack 
operators cvect. Let A be an m x n matrix with rows 
fli, . . . . am- For any column vector x G M" and any my. m 
square matrix W (with elements Wk/), we have 

(aix \ 
: (121) 
amX ) 



fc=l £=1 
m m 

k=i e=i 
= (ai, . . . , am) 



\ Wm,lXX^ 



= rvect(A) (W (g) xx'^) rvect(A)^. 
Moreover, for any column vector y G R™, we have 

/ yi 

{AxfWy = {aix, amx) W : 

\ Vm 

ra ra 

= ^ ^ akxwk,eye 
fc=i e=i 

m m 

= ^ ^ akWk,i.xyi 

fe=i t=i 
= (ai, . . . , am) 

I Wi^iln . ■ . Wi^rrJn \ ( Xyi 



\ xy„ 



(122) 
(123) 

(124) 
(125) 

(126) 

(127) 
(128) 



(129) 
(130) 



= rvect(^) (W (g) In) cvect{xy'^). 

After these preparations, we return to the EM message for 
the case where A{9) = is a general m x n matrix. In this 
case, (89) becomes 

r/(e) = const- i (e [{eX)'^Wz {&X)] -2E [{eX)^WzY] ) 

(131) 

and using (125) and (130) we obtain 
?7(e) = const - ^(rvect(e)E[Wz O XX"^] rvect(e)'^ 

- 2 rvect(e) E [{Wz O /„) cvect(Xy'^)] ) . (132) 
We now see that e^^®^ is Gaussian in rvect(©)^ with 

We = Wz O E [XX'^] (133) 
= Wz {Vx + mxm^x) (134) 

and 

Weme = {Wz <^ /„) cvect(E[Xy'^] ) (135) 
= (Wz ® /„) cvect{VxYT + mxm^). (136) 
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^(o,yz) 



M{Q,Vz) 



X 



A 



U 



Fig. 6. Factor graph for Appendix D. 

Appendix D 

Proofs of the Claims in Table IV Except (IV.6) 

We consider the computation of the mean vectors mx and 
niY and the covariance matrix Vx with respect to the local 
probability density (1.3) 

P\ocM{x,y\9) (X g{x,y,9)~jlx{x)^Y{y) (137) 

with g{x,y,6) as in Table HI (see also (86)) and where 'Jix 
and hy are the incoming ^ Gaussian suni-|)roduct messages 
with parameters rrix and Vx (or Wx = ^x^) and my and 
Vy (or Wy = Vy^), respectively. 

Throughout this section, 6 = ^ is fixed and we will simply 
write A instead of A{9). The factor graph of Table 111 then 
reduces to the factor graph of Fig. 6. The desired quantities 
may be obtained by Gaussian sum-product message passing 
in this factor graph. In the following computations, we will 
frequently use Tables 2 and 3 of [5] without special notice; 
the reader is advised to have these tables at hand. 

Equation (IV. 1) follows from 

Wx =Wx + Wx (138) 
= Wx + A^WuA (139) 

(140) 



Wx+A'^(Vz+Vy) ^A. 



Equation (IV.2) is immediate from 

Vy=Vu + Vz 

= AVxA^ + Vz. 



(141) 
(142) 



Equation (1V.3) is the definition of W as in [5, eq. (56)]. 
Equation (IV.5) follows from [5, (1.4) and (III.8)]: 



Vx = Vx-VxWxVx ^ 
= Vx- VxA^WyAVx- 
Equation (IV.7) follows from 

Wxfnx = Wxmx + Wxrnx 

= Wxmx + A^Wumu 

= Wxmx + A^ (Vz 

Using (144) and (147), Equation (IV.8) foUows from 

mx = VxWxmx (148) 



Vy] rriY- 



(143) 
(144) 

(145) 
(146) 

(147) 



= (Vx-VxA'WyAVx) 

■ (wxmx + A'^ (Vz + Vy) ^ 
= (/„ - VxA'^WyA) 

■ (mx + VxA'^ (Vz +Vy) ^ mY 



(149) 



(150) 



B 



X 



A' 



C 



Y 



Fig. 7. Factor graph for Appendix E. 

Equation (1V.9) is immediate from 

WYmY = Wyttiy + WYrUY- (151) 

Finally, Equation (IV. 10), is obtained using [5, (eq. 1.4)]: 

ruY = VyWYmY (152) 
= {Vy - VyWyVy) {Wytuy + VKy W) (153) 

= (im - VyWy) (niY + VyWytKy) • (154) 

Appendix E 
Proof of (IV.6) 

We need to compute the covariance matrix 

VxYT =E[{X- mx){Y - mYf] (155) 

with respect to the local probability density (137). Consider 
the factor graph shown in Fig. 7 with block matrices 

A 

In 



A'^ 



(156) 



B = 







C={lm, ) 



(157) 
(158) 



where n and m are the dimensions of the column vectors 
X and Y, respectively. This factor graph is obtained from 
the factor graph in Table III by stretching the variable X 
accross the adder node so that the variables X and Y now 
appear jointly as components of the vector (F^, X^)^ on the 
correspondingly labeled edge. The closed-box function g{x, y) 
in Fig. 7 equals the closed-box function g{x, y, 6) in the factor 
graph in Table III. 

The desired matrix VyyT is the lower left comer of the 
covariance matrix 



Vy Vx TY 
y^yT Vx 



(159) 



which can be computed by Gaussian sum-product message 
passing in Fig. 7. As in Appendix D, we wiU use Tables 2 
and 3 of [5] without special notice. We have 



V^u^^ =A'Vx{A') 



AVxA^ AVx 
VxA^ Vx 



(160) 



(161) 
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and 



^ ( AVxA^ + Vz AVx 
I VxA^ Vx 



We also have 



(162) 
(163) 

(164) 
(165) 



and the Matrix Inversion Lemma (see, e.g., [5, eq. (181)]) 

yields 



{Vy + AVxA^ + Vz) ^CV^-,^^ (167) 



V, 



AVxA^ + Vz 
VxA^ 



■ {Vy + AVxA^ + Vz) ' 



■ [aVxA^ + Vz, AVx). (168) 
The lower left corner of this matrix is 

VxYT = VxA^ - VxA^ (Vy + AVxA^ + Fz)"' 

• (jWxA^ + Fz) (169) 
= VxA^ i^Y + J^xA^ + T/z)"' 

• ( (Vy + aVx^^ + Vz) 

-(aVxA^ ^Vz^) (170) 
= VxA^ (aYxA^ + Vz + Vy)'^ Vy (171) 



and using 



Wy = [aVxA'^ + Vz+ Vy) ' (172) 



from (1V.3) and (1V.2) yields (IV.6). 
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